The South Sudan Monitoring and Evaluation Support Project (MESP) is conducting a household resilience survey in 13 counties. The purpose of this household survey is to obtain baseline data in the target areas for the indicators included in the Mission’s Performance Management Plan (PMP) and the Community Roadmap, in support of USAID/South Sudan’s Strategic Framework (2020-2024).
MESP has completed data collection in six counties, and as of October 2021 is carrying out data collection in the remaining seven counties. The study team is currently reviewing the data from six counties in advance of analysis of the full set of 13 counties. This document reviews a selection of measures and indices relating to resilience.
The United States Agency for International Development (USAID) defines resilience as the ability of people, households, communities, countries, and systems to mitigate, adapt to and recover from shocks and stresses in a manner that reduces chronic vulnerability and facilitates inclusive growth (Measuring Resilience in USAID). For USAID-funded activities, resilience is operationalized as a set of capacities that mediate the interaction of shocks and stresses on a measure of household well-being (Tango International 2018).
Common well-being measures include poverty, food insecurity, child wasting, and self-perceived ability to recover from shocks or stresses. Shocks and stresses are measured across a range of specific events (shocks) or risks (stresses) and aggregated to a single measure of exposure. Capacities are measured along three primary indices, with each dimension consisting of its own set of sub-indices:
The Food and Agricultural Organization (FAO) also has a methodology for measuring resilience (FAO 2016), which adds measures such as dietary diversity as a complement to food insecurity.
The South Sudan resilience study reviewed this list of outcomes, shocks, and capacities and selected a subset of them for measuring resilience:
This document will conduct preliminary review of these measures, using household survey data in six counties.
The data comes from a household survey in the counties of Akobo, Budi, Duk, Leer, Pibor, and Uror, using a two-stage sampling procedure where the sampling points, referred to as enumeration areas (EAs), are randomly selected within each county, and households are randomly selected within each EA. The following map illustrates the sampling points across all six counties.
include_graphics(here("exploratory analysis/viz/maps/MESP_HH_Survey_Sampling_Point_ALL.jpg"))
A total sample size of 3,563 households were apportioned relatively evenly across the six counties.
smpl <- svyrdat %>%
group_by(County=county) %>%
summarize(`Population`=survey_total(),
`Sample`=unweighted(n())) %>%
select(1,2,4)
smpl_gt <- smpl %>%
gt() %>%
fmt_number(2:3,
decimals=0)
smpl_gt
| County | Population | Sample |
|---|---|---|
| Akobo | 11,434 | 625 |
| Budi | 13,071 | 620 |
| Duk | 6,275 | 518 |
| Leer | 5,716 | 620 |
| Pibor | 36,526 | 564 |
| Uror | 39,354 | 616 |
Under simple random sampling, and using the measure of belief in local government support in the face of shocks and stresses, the overall margin of error of the survey would be 1.9 percent, per the following table.
dat %>%
summarize(mn = mean(resil_c, na.rm=T),
std.err = std.error(resil_c, na.rm=T),
margin = 1.96*std.err)
| mn | std.err | margin |
|---|---|---|
| 0.409 | 0.00967 | 0.0189 |
Given the clustered nature of the data collection, the complex margin of error overall is 5.9 percent, per the following table:
svymean(~resil_c,
na.rm=T,
deff="replace",
design=svydat) %>%
as.data.frame() %>%
mutate(Margin = 1.96*resil_c,
deft=sqrt(deff))
| mean | resil_c | deff | Margin | deft |
|---|---|---|---|---|
| 0.351 | 0.0302 | 10.3 | 0.0592 | 3.22 |
Across county, the simple margin of error ranges from 3.5 - 5.5 percent, per the following table:
dat %>%
group_by(county) %>%
summarize(mn=mean(resil_c, na.rm=T),
se=std.error(resil_c, na.rm=T),
margin=1.96*se)
| county | mn | se | margin |
|---|---|---|---|
| Akobo | 0.308 | 0.0267 | 0.0524 |
| Budi | 0.622 | 0.0203 | 0.0398 |
| Duk | 0.262 | 0.0197 | 0.0386 |
| Leer | 0.625 | 0.0238 | 0.0466 |
| Pibor | 0.198 | 0.0178 | 0.035 |
| Uror | 0.401 | 0.0282 | 0.0554 |
Accounting for the clustered nature of the data collection, the complex margin of error across county ranges from 3.6 - 19.1 percent, per the following table:
svyrdat %>%
group_by(county) %>%
summarize(mn=survey_mean(resil_c, na.rm=T)) %>%
mutate(margin=1.96*mn_se)
| county | mn | mn_se | margin |
|---|---|---|---|
| Akobo | 0.293 | 0.0431 | 0.0845 |
| Budi | 0.5 | 0.0954 | 0.187 |
| Duk | 0.364 | 0.0977 | 0.191 |
| Leer | 0.63 | 0.0465 | 0.0912 |
| Pibor | 0.202 | 0.0185 | 0.0363 |
| Uror | 0.455 | 0.0862 | 0.169 |
In the exploratory analysis that follows, both weighted or unweighted frequencies may be used. Generally speaking, when reporting an overall measure such as a mean or frequency, weighted measures should be considered as the estimates of their respective population parameters. In other cases where interest may be more in how the measures vary across disaggregations of interest, unweighted measures may be used.
Following (Tango International 2018), the Aspirations index comprises six binary choice survey items, in which respondents either affirm or deny statements relating to their sense of agency in life, or whether the household has concrete plans for the future. The following table shows these items and their overall weighted frequencies.
asp <- svyrdat %>%
group_by() %>%
summarize(asp1=mean(asp1, na.rm=T),
asp2=mean(asp2, na.rm=T),
asp3=mean(asp3, na.rm=T),
asp4=mean(asp4, na.rm=T),
asp5=mean(asp5, na.rm=T),
asp6=mean(asp6, na.rm=T)) %>%
t() %>%
as.data.frame() %>%
rownames_to_column("var_name") %>%
mutate(item_num = c("q_629", "q_630", "q_634", "q_635", "q_632", "q_633"),
Description=c("Each person is responsible for his/her own success or failure in life",
"To be successful one needs to work very hard rather than rely on luck",
"My experience in life has been that was is going to happen will happen (reverse-coded)",
"It is not always good for me to plan too far ahead because many things turn out to be a matter of good or bad fortune (reverse-coded)",
"Hopeful for children's future",
"Desire for their children to graduate from secondary school")) %>%
select(var_name, item_num, Description, Percent=V1)
asp
| var_name | item_num | Description | Percent |
|---|---|---|---|
| asp1 | q_629 | Each person is responsible for his/her own success or failure in life | 0.781 |
| asp2 | q_630 | To be successful one needs to work very hard rather than rely on luck | 0.797 |
| asp3 | q_634 | My experience in life has been that was is going to happen will happen (reverse-coded) | 0.342 |
| asp4 | q_635 | It is not always good for me to plan too far ahead because many things turn out to be a matter of good or bad fortune (reverse-coded) | 0.368 |
| asp5 | q_632 | Hopeful for children's future | 0.839 |
| asp6 | q_633 | Desire for their children to graduate from secondary school | 0.916 |
A simple summation of the six binary items provides an ordered scale ranging from 0-6.
The simple mean and other descriptives
describe(dat$aspirations_index2)
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3.45e+03 | 4.04 | 1.16 | 4 | 4.07 | 1.48 | 0 | 6 | 6 | -0.254 | -0.0871 | 0.0197 |
The overall weighted value is 3.9 out of six, with a margin of 0.12. The median is four.
svyrdat %>%
summarize(Aspirations=survey_mean(aspirations_index2, na.rm=T),
Aspirations_med = survey_quantile(aspirations_index2, .5, na.rm=T)) %>%
mutate(margin=1.96*Aspirations_se)
| Aspirations | Aspirations_se | Aspirations_med_q50 | Aspirations_med_q50_se | margin |
|---|---|---|---|---|
| 3.9 | 0.0591 | 4 | 0 | 0.116 |
ggplot(dat, aes(aspirations_index2)) +
geom_bar(width=.2, fill="dodgerblue2") +
geom_vline(xintercept=3.9, size=1, color="darkgoldenrod2") +
geom_vline(xintercept=4, size=1, color="turquoise") +
scale_x_continuous(breaks=0:6)
Unweighted values by county.
asp_cnty <- dat %>%
group_by(county) %>%
summarize(asp = mean(aspirations_index2, na.rm=T),
se = std.error(aspirations_index2)) %>%
mutate(lower=asp-1.96*se,
upper=asp+1.96*se)
asp_cnty
| county | asp | se | lower | upper |
|---|---|---|---|---|
| Akobo | 4 | 0.0475 | 3.9 | 4.09 |
| Budi | 4.42 | 0.055 | 4.32 | 4.53 |
| Duk | 3.95 | 0.0342 | 3.89 | 4.02 |
| Leer | 4.38 | 0.0454 | 4.3 | 4.47 |
| Pibor | 3.54 | 0.0349 | 3.47 | 3.61 |
| Uror | 3.89 | 0.0523 | 3.78 | 3.99 |
ggplot(asp_cnty, aes(asp, fct_reorder(county, asp))) +
geom_vline(xintercept=3.9, size=1, alpha=.7, color="darkgoldenrod2") +
geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") +
geom_point(color="dodgerblue", size=3) +
geom_errorbar(aes(xmin=lower, xmax=upper),
color="dodgerblue",
width=0,
size=1.1) +
scale_x_continuous(limits=c(0,6),
breaks=0:6) +
labs(x="Aspirations index",
y="")
Weighted values by county
asp_cnty_wt <- svyrdat %>%
group_by(county) %>%
summarize(asp = survey_mean(aspirations_index2, na.rm=T)) %>%
mutate(lower=asp-1.96*asp_se,
upper=asp+1.96*asp_se)
asp_cnty_wt
| county | asp | asp_se | lower | upper |
|---|---|---|---|---|
| Akobo | 4.04 | 0.104 | 3.84 | 4.24 |
| Budi | 4.56 | 0.116 | 4.33 | 4.79 |
| Duk | 3.99 | 0.0672 | 3.86 | 4.12 |
| Leer | 4.41 | 0.0557 | 4.3 | 4.52 |
| Pibor | 3.54 | 0.0726 | 3.4 | 3.69 |
| Uror | 3.87 | 0.136 | 3.6 | 4.14 |
ggplot(asp_cnty_wt, aes(asp, fct_reorder(county, asp))) +
geom_vline(xintercept=3.9, size=1, alpha=.7, color="darkgoldenrod2") +
geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") +
geom_point(color="dodgerblue", size=3) +
geom_errorbar(aes(xmin=lower, xmax=upper),
color="dodgerblue",
width=0,
size=1) +
scale_x_continuous(limits=c(0,6),
breaks=0:6) +
labs(x="Aspirations index",
y="")
Assess suitability of measures for index construction using factor analysis
dat %>%
select(asp1:asp6) %>%
fa.parallel(cor="tet")
## Parallel analysis suggests that the number of factors = 3 and the number of components = 3
Three factors or components out of six items is not a good sign.
asp_alpha <- dat %>%
select(asp1:asp6) %>%
alpha()
## Some items ( asp3 asp4 ) were negatively correlated with the total scale and
## probably should be reversed.
## To do this, run the function again with the 'check.keys=TRUE' option
asp_alpha
##
## Reliability analysis
## Call: alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.29 0.29 0.31 0.063 0.41 0.018 0.67 0.19 0.064
##
## lower alpha upper 95% confidence boundaries
## 0.26 0.29 0.33
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## asp1 0.27 0.25 0.27 0.064 0.34 0.019 0.0210 0.028
## asp2 0.22 0.20 0.24 0.048 0.25 0.020 0.0222 -0.011
## asp3 0.27 0.29 0.27 0.075 0.40 0.019 0.0088 0.071
## asp4 0.19 0.22 0.22 0.055 0.29 0.022 0.0108 0.037
## asp5 0.29 0.26 0.27 0.065 0.35 0.018 0.0183 0.033
## asp6 0.29 0.28 0.30 0.073 0.40 0.019 0.0217 0.068
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## asp1 3541 0.46 0.47 0.24 0.12 0.78 0.41
## asp2 3540 0.50 0.52 0.34 0.18 0.80 0.40
## asp3 3540 0.51 0.43 0.23 0.12 0.34 0.47
## asp4 3541 0.59 0.50 0.36 0.21 0.37 0.48
## asp5 3531 0.40 0.46 0.24 0.08 0.84 0.37
## asp6 3477 0.31 0.43 0.16 0.07 0.92 0.28
##
## Non missing response frequency for each item
## 0 1 miss
## asp1 0.22 0.78 0.01
## asp2 0.20 0.80 0.01
## asp3 0.66 0.34 0.01
## asp4 0.63 0.37 0.01
## asp5 0.16 0.84 0.01
## asp6 0.08 0.92 0.02
A reliability (alpha) value of .29 is unacceptable for index formation. This will also exhibit itself in the loadings of an exploratory factor analysis (EFA).
asp_fa <- dat %>%
select(asp1:asp6) %>%
fa(cor="tet")
asp_fa
## Factor Analysis using method = minres
## Call: fa(r = ., cor = "tet")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## asp1 0.31 0.095 0.90 1
## asp2 0.37 0.134 0.87 1
## asp3 -0.28 0.077 0.92 1
## asp4 -0.21 0.044 0.96 1
## asp5 0.79 0.620 0.38 1
## asp6 0.43 0.184 0.82 1
##
## MR1
## SS loadings 1.15
## Proportion Var 0.19
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 15 and the objective function was 0.99 with Chi Square of 3511
## The degrees of freedom for the model are 9 and the objective function was 0.62
##
## The root mean square of the residuals (RMSR) is 0.17
## The df corrected root mean square of the residuals is 0.22
##
## The harmonic number of observations is 3515 with the empirical chi square 3142 with prob < 0
## The total number of observations was 3563 with Likelihood Chi Square = 2210 with prob < 0
##
## Tucker Lewis Index of factoring reliability = -0.049
## RMSEA index = 0.262 and the 90 % confidence intervals are 0.253 0.271
## BIC = 2136
## Fit based upon off diagonal values = 0.5
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.83
## Multiple R square of scores with factors 0.69
## Minimum correlation of possible factor scores 0.39
Only two of the six items have loadings exceeding 0.4, while the reverse-coded items have negative loadings. Let’s try again without the reverse-coded items.
dat %>%
select(asp1:asp2,
asp5:asp6) %>%
fa.parallel(cor="tet")
## Parallel analysis suggests that the number of factors = 2 and the number of components = 2
asp2_alpha
## Error in eval(expr, envir, enclos): object 'asp2_alpha' not found
Two factors or components out of four items is still not a good sign.
asp2_alpha <- dat %>%
select(asp1:asp2,
asp5:asp6) %>%
alpha()
asp2_alpha
##
## Reliability analysis
## Call: alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.36 0.36 0.31 0.12 0.55 0.017 0.83 0.22 0.13
##
## lower alpha upper 95% confidence boundaries
## 0.33 0.36 0.39
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## asp1 0.31 0.32 0.24 0.136 0.47 0.020 0.0038 0.142
## asp2 0.25 0.26 0.20 0.106 0.36 0.021 0.0094 0.122
## asp5 0.25 0.23 0.18 0.091 0.30 0.021 0.0098 0.072
## asp6 0.35 0.35 0.27 0.154 0.55 0.019 0.0015 0.142
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## asp1 3541 0.62 0.57 0.29 0.18 0.78 0.41
## asp2 3540 0.64 0.60 0.36 0.23 0.80 0.40
## asp5 3531 0.62 0.62 0.40 0.23 0.84 0.37
## asp6 3477 0.44 0.54 0.24 0.13 0.92 0.28
##
## Non missing response frequency for each item
## 0 1 miss
## asp1 0.22 0.78 0.01
## asp2 0.20 0.80 0.01
## asp5 0.16 0.84 0.01
## asp6 0.08 0.92 0.02
A reliability score of .36 is still unacceptable. Let’s look at the EFA loadings.
asp2_fa <- dat %>%
select(asp1:asp2,
asp5:asp6) %>%
fa(cor="tet")
asp2_fa
## Factor Analysis using method = minres
## Call: fa(r = ., cor = "tet")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## asp1 0.36 0.13 0.87 1
## asp2 0.47 0.22 0.78 1
## asp5 0.73 0.53 0.47 1
## asp6 0.43 0.19 0.81 1
##
## MR1
## SS loadings 1.07
## Proportion Var 0.27
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 6 and the objective function was 0.44 with Chi Square of 1564
## The degrees of freedom for the model are 2 and the objective function was 0.12
##
## The root mean square of the residuals (RMSR) is 0.11
## The df corrected root mean square of the residuals is 0.19
##
## The harmonic number of observations is 3506 with the empirical chi square 505 with prob < 2.3e-110
## The total number of observations was 3563 with Likelihood Chi Square = 414 with prob < 1.2e-90
##
## Tucker Lewis Index of factoring reliability = 0.206
## RMSEA index = 0.24 and the 90 % confidence intervals are 0.221 0.26
## BIC = 398
## Fit based upon off diagonal values = 0.84
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.80
## Multiple R square of scores with factors 0.64
## Minimum correlation of possible factor scores 0.28
The loadings aren’t terrible, but measures of fit such as Root Mean Square Error of Approximation (RMSEA) and Tucker Lewis Index (TLI) indicate that the factor model should not be used.
Let’s include measures from the locus of control, which is a sub-index of the aspirations index. These items are attitudinal statements:
dat %>%
select(asp1:asp2,
asp5:asp6,
q_636:q_638) %>%
fa.parallel(cor="mixed")
## Parallel analysis suggests that the number of factors = 3 and the number of components = 3
Three factors or components out of seven items, still not a good sign.
asp3_alpha <- dat %>%
select(asp1:asp2,
asp5:asp6,
q_636:q_638) %>%
alpha()
asp3_alpha
##
## Reliability analysis
## Call: alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.55 0.5 0.52 0.13 1 0.0081 2.4 0.52 0.086
##
## lower alpha upper 95% confidence boundaries
## 0.54 0.55 0.57
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## asp1 0.55 0.48 0.50 0.135 0.94 0.0082 0.0271 0.073
## asp2 0.57 0.50 0.51 0.144 1.01 0.0080 0.0258 0.097
## asp5 0.55 0.47 0.48 0.127 0.87 0.0082 0.0284 0.053
## asp6 0.57 0.53 0.53 0.159 1.13 0.0081 0.0225 0.119
## q_636 0.45 0.44 0.45 0.117 0.80 0.0096 0.0152 0.097
## q_637 0.40 0.39 0.40 0.097 0.65 0.0105 0.0136 0.073
## q_638 0.37 0.40 0.39 0.101 0.67 0.0119 0.0096 0.086
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## asp1 3541 0.262 0.46 0.27 0.150 0.78 0.41
## asp2 3540 0.183 0.42 0.21 0.070 0.80 0.40
## asp5 3531 0.249 0.49 0.32 0.157 0.84 0.37
## asp6 3477 0.098 0.36 0.12 0.022 0.92 0.28
## q_636 3540 0.754 0.54 0.44 0.426 4.01 1.62
## q_637 3542 0.737 0.62 0.58 0.505 4.91 1.26
## q_638 3539 0.789 0.61 0.58 0.537 4.71 1.46
##
## Non missing response frequency for each item
## 0 1 2 3 4 5 6 miss
## asp1 0.22 0.78 0.00 0.00 0.00 0.00 0.00 0.01
## asp2 0.20 0.80 0.00 0.00 0.00 0.00 0.00 0.01
## asp5 0.16 0.84 0.00 0.00 0.00 0.00 0.00 0.01
## asp6 0.08 0.92 0.00 0.00 0.00 0.00 0.00 0.02
## q_636 0.00 0.08 0.21 0.04 0.11 0.41 0.15 0.01
## q_637 0.00 0.03 0.06 0.02 0.08 0.44 0.35 0.01
## q_638 0.00 0.06 0.08 0.03 0.08 0.41 0.34 0.01
A reliability score of .55 is an improvement, but still does not reach the desired benchmark of .7.
asp3_fa <- dat %>%
select(asp1:asp2,
asp5:asp6,
q_636:q_638) %>%
fa(cor="mixed")
asp3_fa
## Factor Analysis using method = minres
## Call: fa(r = ., cor = "mixed")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## asp1 0.31 0.097 0.90 1
## asp2 0.24 0.056 0.94 1
## asp5 0.36 0.128 0.87 1
## asp6 0.17 0.028 0.97 1
## q_636 0.54 0.287 0.71 1
## q_637 0.66 0.436 0.56 1
## q_638 0.67 0.444 0.56 1
##
## MR1
## SS loadings 1.47
## Proportion Var 0.21
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 21 and the objective function was 1.11 with Chi Square of 3958
## The degrees of freedom for the model are 14 and the objective function was 0.48
##
## The root mean square of the residuals (RMSR) is 0.13
## The df corrected root mean square of the residuals is 0.16
##
## The harmonic number of observations is 3518 with the empirical chi square 2642 with prob < 0
## The total number of observations was 3563 with Likelihood Chi Square = 1692 with prob < 0
##
## Tucker Lewis Index of factoring reliability = 0.36
## RMSEA index = 0.183 and the 90 % confidence intervals are 0.176 0.191
## BIC = 1578
## Fit based upon off diagonal values = 0.69
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.83
## Multiple R square of scores with factors 0.70
## Minimum correlation of possible factor scores 0.39
Let’s try just the three locus of control items.
dat %>%
select(q_636:q_638) %>%
fa.parallel(cor="poly")
## Parallel analysis suggests that the number of factors = 1 and the number of components = 1
One factor or component, that’s hopeful.
asp4_alpha <- dat %>%
select(q_636:q_638) %>%
alpha()
asp4_alpha
##
## Reliability analysis
## Call: alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.68 0.69 0.6 0.43 2.2 0.0091 4.5 1.1 0.43
##
## lower alpha upper 95% confidence boundaries
## 0.67 0.68 0.7
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## q_636 0.66 0.66 0.50 0.50 2.0 0.011 NA 0.50
## q_637 0.60 0.60 0.43 0.43 1.5 0.013 NA 0.43
## q_638 0.51 0.52 0.35 0.35 1.1 0.016 NA 0.35
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## q_636 3540 0.79 0.76 0.54 0.46 4.0 1.6
## q_637 3542 0.75 0.78 0.61 0.50 4.9 1.3
## q_638 3539 0.82 0.82 0.68 0.56 4.7 1.5
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 miss
## q_636 0.08 0.21 0.04 0.11 0.41 0.15 0.01
## q_637 0.03 0.06 0.02 0.08 0.44 0.35 0.01
## q_638 0.06 0.08 0.03 0.08 0.41 0.34 0.01
Reliability of .68, that’s good enough for an index.
asp4_fa <- dat %>%
select(q_636:q_638) %>%
fa(cor="poly")
asp4_fa
## Factor Analysis using method = minres
## Call: fa(r = ., cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## q_636 0.60 0.35 0.65 1
## q_637 0.63 0.40 0.60 1
## q_638 0.79 0.62 0.38 1
##
## MR1
## SS loadings 1.38
## Proportion Var 0.46
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 3 and the objective function was 0.57 with Chi Square of 2027
## The degrees of freedom for the model are 0 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is NA
##
## The harmonic number of observations is 3538 with the empirical chi square 0 with prob < NA
## The total number of observations was 3563 with Likelihood Chi Square = 0 with prob < NA
##
## Tucker Lewis Index of factoring reliability = -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.86
## Multiple R square of scores with factors 0.74
## Minimum correlation of possible factor scores 0.48
library(effectsize)
interpret_nnfi(asp4_fa)
## Error in UseMethod("interpret"): no applicable method for 'interpret' applied to an object of class "c('psych', 'fa')"
I’m not sure how to interpret the fit scores, but I think we can generate an index out of these three items. Let’s impute missing values and generate the index.
locus <- dat %>%
select(county,
hhsize=household_members,
q_636:q_638) %>%
as.data.frame() %>%
mutate(County = factor(county)) %>%
select(-county)
locus[,1:4] <- lapply(locus[,1:4], as.numeric)
#str(locus)
locus_imp <- missForest(locus)
## missForest iteration 1 in progress...done!
## missForest iteration 2 in progress...done!
## missForest iteration 3 in progress...done!
## missForest iteration 4 in progress...done!
locus_imp$OOBerror
## NRMSE PFC
## 0.762 0.000
locus_out <- locus_imp$ximp
Compare means for raw and imputed variables
describe(locus)
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3.55e+03 | 4.97 | 2.86 | 5 | 4.78 | 2.97 | 1 | 23 | 22 | 0.707 | 1.51 | 0.048 |
| 2 | 3.54e+03 | 4.01 | 1.62 | 5 | 4.12 | 1.48 | 1 | 6 | 5 | -0.557 | -1.12 | 0.0272 |
| 3 | 3.54e+03 | 4.91 | 1.26 | 5 | 5.17 | 1.48 | 1 | 6 | 5 | -1.59 | 2.03 | 0.0211 |
| 4 | 3.54e+03 | 4.71 | 1.46 | 5 | 4.96 | 1.48 | 1 | 6 | 5 | -1.3 | 0.652 | 0.0246 |
| 5 | 3.56e+03 | 3.48 | 1.73 | 4 | 3.48 | 2.97 | 1 | 6 | 5 | 0.00988 | -1.3 | 0.029 |
describe(locus_out)
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3.56e+03 | 4.97 | 2.85 | 5 | 4.78 | 2.97 | 1 | 23 | 22 | 0.706 | 1.51 | 0.0478 |
| 2 | 3.56e+03 | 4.02 | 1.61 | 5 | 4.12 | 1.48 | 1 | 6 | 5 | -0.561 | -1.11 | 0.027 |
| 3 | 3.56e+03 | 4.91 | 1.25 | 5 | 5.17 | 1.48 | 1 | 6 | 5 | -1.59 | 2.06 | 0.021 |
| 4 | 3.56e+03 | 4.71 | 1.46 | 5 | 4.96 | 1.48 | 1 | 6 | 5 | -1.31 | 0.667 | 0.0244 |
| 5 | 3.56e+03 | 3.48 | 1.73 | 4 | 3.48 | 2.97 | 1 | 6 | 5 | 0.00988 | -1.3 | 0.029 |
a <- locus %>%
mutate(id=1:3563) %>%
filter(is.na(hhsize) |
is.na(q_636) |
is.na(q_637) |
is.na(q_638))
a_id <- a$id
a[1:6,]
| hhsize | q_636 | q_637 | q_638 | County | id |
|---|---|---|---|---|---|
| 6 | 4 | 2 | Akobo | 107 | |
| 5 | 6 | 6 | Duk | 325 | |
| 4 | Duk | 342 | |||
| 6 | 6 | 6 | Duk | 376 | |
| 7 | 5 | 6 | Duk | 425 | |
| 6 | 6 | 6 | Duk | 510 |
b <- locus_out %>%
mutate(id=1:3563) %>%
filter(id %in% a_id)
b[1:6,]
| hhsize | q_636 | q_637 | q_638 | County | id |
|---|---|---|---|---|---|
| 5.13 | 6 | 4 | 2 | Akobo | 107 |
| 5 | 5.53 | 6 | 6 | Duk | 325 |
| 4 | 4.91 | 5.11 | 5.3 | Duk | 342 |
| 5.96 | 6 | 6 | 6 | Duk | 376 |
| 7 | 5 | 6 | 5.53 | Duk | 425 |
| 5.96 | 6 | 6 | 6 | Duk | 510 |
loc_fa <- locus_out %>%
select(2:4) %>%
fa(cor="poly")
loc_fa
## Factor Analysis using method = minres
## Call: fa(r = ., cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## q_636 0.59 0.34 0.66 1
## q_637 0.63 0.39 0.61 1
## q_638 0.79 0.62 0.38 1
##
## MR1
## SS loadings 1.36
## Proportion Var 0.45
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 3 and the objective function was 0.56 with Chi Square of 1977
## The degrees of freedom for the model are 0 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is NA
##
## The harmonic number of observations is 3563 with the empirical chi square 0 with prob < NA
## The total number of observations was 3563 with Likelihood Chi Square = 0 with prob < NA
##
## Tucker Lewis Index of factoring reliability = -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.86
## Multiple R square of scores with factors 0.74
## Minimum correlation of possible factor scores 0.48
Compare to the non-imputed loadings
asp4_fa
## Factor Analysis using method = minres
## Call: fa(r = ., cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## q_636 0.60 0.35 0.65 1
## q_637 0.63 0.40 0.60 1
## q_638 0.79 0.62 0.38 1
##
## MR1
## SS loadings 1.38
## Proportion Var 0.46
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 3 and the objective function was 0.57 with Chi Square of 2027
## The degrees of freedom for the model are 0 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is NA
##
## The harmonic number of observations is 3538 with the empirical chi square 0 with prob < NA
## The total number of observations was 3563 with Likelihood Chi Square = 0 with prob < NA
##
## Tucker Lewis Index of factoring reliability = -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.86
## Multiple R square of scores with factors 0.74
## Minimum correlation of possible factor scores 0.48
The loading for q_636 goes down a tad, but otherwise ok
dat <- dat %>%
mutate(Locus = loc_fa$scores)
describe(dat$Locus)
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3.56e+03 | -2.51e-16 | 0.865 | 0.283 | 0.125 | 0.557 | -2.7 | 1.03 | 3.73 | -1.28 | 1.1 | 0.0145 |
ggplot(dat, aes(Locus)) +
geom_density(size=1, color="darkblue")
By county
loc_cnty <- dat %>%
group_by(county) %>%
summarize(loc = mean(Locus, na.rm=T),
se = std.error(Locus),
q638=mean(q_638, na.rm=T),
se2 = std.error(q_638, na.rm=T)) %>%
mutate(lower=loc-1.96*se,
upper=loc+1.96*se,
lower2 = q638-1.96*se2,
upper2 = q638+1.96*se2)
loc_cnty
| county | loc | se | q638 | se2 | lower | upper | lower2 | upper2 |
|---|---|---|---|---|---|---|---|---|
| Akobo | -0.157 | 0.039 | 4.29 | 0.0683 | -0.233 | -0.0802 | 4.16 | 4.43 |
| Budi | -0.00318 | 0.0317 | 4.83 | 0.0478 | -0.0653 | 0.0589 | 4.74 | 4.93 |
| Duk | 0.492 | 0.021 | 5.4 | 0.0353 | 0.45 | 0.533 | 5.33 | 5.47 |
| Leer | -0.209 | 0.0369 | 4.29 | 0.0712 | -0.281 | -0.136 | 4.15 | 4.43 |
| Pibor | 0.351 | 0.0155 | 5.24 | 0.0251 | 0.321 | 0.382 | 5.19 | 5.29 |
| Uror | -0.363 | 0.0414 | 4.39 | 0.0687 | -0.444 | -0.281 | 4.25 | 4.52 |
ggplot(loc_cnty, aes(loc, fct_reorder(county, loc))) +
geom_vline(xintercept=0, size=1, alpha=.7, color="grey60") +
# geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") +
geom_point(color="dodgerblue", size=3) +
geom_errorbar(aes(xmin=lower, xmax=upper),
color="dodgerblue",
width=0,
size=1.1) +
scale_x_continuous(limits=c(-1,1),
breaks=seq(-1,1,.5)) +
labs(x="Locus of control index",
y="")
ggplot(loc_cnty, aes(q638, fct_reorder(county, q638))) +
geom_vline(xintercept=mean(dat$q_638, na.rm=T), size=1, alpha=.7, color="darkgoldenrod2") +
#geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") +
geom_point(color="dodgerblue", size=3) +
geom_errorbar(aes(xmin=lower2, xmax=upper2),
color="dodgerblue",
width=0,
size=1.1) +
scale_x_continuous(limits=c(4,6),
breaks=4:6) +
labs(x="My life is determined by my own actions",
y="")
Weighted values by county
loc_cnty_wt <- svyrdat %>%
group_by(county) %>%
summarise(loc = survey_mean(Locus, na.rm=T)) %>%
mutate(lower=loc-1.96*loc_se,
upper=loc+1.96*loc_se)
loc_cnty_wt
| county | loc | loc_se | lower | upper |
|---|---|---|---|---|
| Akobo | -0.166 | 0.102 | -0.367 | 0.0349 |
| Budi | -0.0152 | 0.135 | -0.28 | 0.25 |
| Duk | 0.541 | 0.0837 | 0.377 | 0.705 |
| Leer | -0.144 | 0.0737 | -0.288 | 0.000862 |
| Pibor | 0.36 | 0.0245 | 0.312 | 0.408 |
| Uror | -0.299 | 0.117 | -0.528 | -0.0708 |
ggplot(loc_cnty_wt, aes(loc, fct_reorder(county, loc))) +
geom_vline(xintercept=0, size=1, alpha=.7, color="grey60") +
# geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") +
geom_point(color="dodgerblue", size=3) +
geom_errorbar(aes(xmin=lower, xmax=upper),
color="dodgerblue",
width=0,
size=1.1) +
scale_x_continuous(limits=c(-1,1),
breaks=seq(-1,1,.5)) +
labs(x="Locus of control index",
y="")
## Error in ggplot(loc_cnty_wt, aes(loc, fct_reorder(county, loc))): object 'loc_cnty_wt' not found